meta-mdp approach
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework.
Reviews: A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Even after the discussion and the author response there was still some disagreement between the reviewers. The paper proposes a simple yet novel and very interesting idea. There still are a few concerns about clarity, but those can be fixed in the final version (see updated reviews). Overall this is a solid paper, that (as always) would benefit from more thorough empirical evaluation. One reviewer proposed to add an additional baseline of a domain-randomized robust policy that is trained on various tasks.
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework.
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
Garcia, Francisco, Thomas, Philip S.
In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed framework. Papers published at the Neural Information Processing Systems Conference.